Limit Synchronization in Markov Decision Processes

نویسندگان

Laurent Doyen

Thierry Massart

Mahsa Shirmohammadi

چکیده

Markov decision processes (MDP) are finite-state systems with both strategic and probabilistic choices. After fixing a strategy, an MDP produces a sequence of probability distributions over states. The sequence is eventually synchronizing if the probability mass accumulates in a single state, possibly in the limit. Precisely, for 0 ≤ p ≤ 1 the sequence is p-synchronizing if a probability distribution in the sequence assigns probability at least p to some state, and we distinguish three synchronization modes: (i) sure winning if there exists a strategy that produces a 1-synchronizing sequence; (ii) almost-sure winning if there exists a strategy that produces a sequence that is, for all ǫ > 0, a (1-ǫ)synchronizing sequence; (iii) limit-sure winning if for all ǫ > 0, there exists a strategy that produces a (1-ǫ)-synchronizing sequence. We consider the problem of deciding whether an MDP is sure, almost-sure, or limitsure winning, and we establish the decidability and optimal complexity for all modes, as well as the memory requirements for winning strategies. Our main contributions are as follows: (a) for each winning modes we present characterizations that give a PSPACE complexity for the decision problems, and we establish matching PSPACE lower bounds; (b) we show that for sure winning strategies, exponential memory is sufficient and may be necessary, and that in general infinite memory is necessary for almost-sure winning, and unbounded memory is necessary for limitsure winning; (c) along with our results, we establish new complexity results for alternating finite automata over a one-letter alphabet.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust Synchronization in Markov Decision Processes

We consider synchronizing properties of Markov decision processes (MDP), viewed as generators of sequences of probability distributions over states. A probability distribution is p-synchronizing if the probability mass is at least p in some state, and a sequence of probability distributions is weakly p-synchronizing, or strongly p-synchronizing if respectively infinitely many, or all but finite...

متن کامل

Accelerated decomposition techniques for large discounted Markov decision processes

Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...

متن کامل

Central-limit approach to risk-aware Markov decision processes

Whereas classical Markov decision processes maximize the expected reward, we consider minimizing the risk. We propose to evaluate the risk associated to a given policy over a longenough time horizon with the help of a central limit theorem. The proposed approach works whether the transition probabilities are known or not. We also provide a gradient-based policy improvement algorithm that conver...

متن کامل

A functional central limit theorem for Markov additive arrival processes and its applications to queueing systems

We prove a functional central limit theorem for Markov additive arrival processes (MAAPs) where the modulating Markov process has the transition rate matrix scaled up by n (α > 0) and the mean and variance of the arrival process are scaled up by n. It is applied to an infinite-server queue and a fork-join network with a non-exchangeable synchronization constraint, where in both systems both the...

متن کامل